A Constant-Factor Approximation Algorithm for Co-clustering

نویسندگان

  • Aris Anagnostopoulos
  • Anirban Dasgupta
  • Ravi Kumar
چکیده

Co-clustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, market-basket analysis, and bioinformatics, this problem has attracted a lot of attention in the past few years. Unfortunately, to date, most of the algorithmic work on this problem has been heuristic in nature. In this work we obtain the first approximation algorithms for the co-clustering problem. Our algorithms are simple and provide constant-factor approximations to the optimum. We also show that co-clustering is NP-hard, thereby complementing our algorithmic result. ACM Classification: F.2.0 AMS Classification: 68W25

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximation Algorithms for Tensor Clustering

We present the first (to our knowledge) approximation algorithm for tensor clustering—a powerful generalization to basic 1D clustering. Tensors are increasingly common in modern applications dealing with complex heterogeneous data and clustering them is a fundamental tool for data analysis and pattern discovery. Akin to their 1D cousins, common tensor clustering formulations are NP-hard to opti...

متن کامل

A Constant-Factor Bi-Criteria Approximation Guarantee for k-means++

This paper studies the k-means++ algorithm for clustering as well as the class ofD sampling algorithms to which k-means++ belongs. It is shown that for any constant factor β > 1, selecting βk cluster centers by D sampling yields a constant-factor approximation to the optimal clustering with k centers, in expectation and without conditions on the dataset. This result extends the previously known...

متن کامل

A Constant Approximation for Streaming k-means

This article gives a constant factor approximation algorithm for streaming k-means that usesO(k log n) space.

متن کامل

1 0 Fe b 20 09 Approximation Algorithms for Bregman Co - clustering and Tensor Clustering ∗

In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9, 17], and tensor clustering [8, 32]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximat...

متن کامل

Hierarchical Clustering via Spreading Metrics

We study the cost function for hierarchical clusterings introduced by [Dasgupta, 2016] where hierarchies are treated as first-class objects rather than deriving their cost from projections into flat clusters. It was also shown in [Dasgupta, 2016] that a top-down algorithm returns a hierarchical clustering of cost at most O (αn log n) times the cost of the optimal hierarchical clustering, where ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theory of Computing

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2012